multimedia application
A Kullback-Leibler Divergence Based Kernel for SVM Classification in Multimedia Applications
Over the last years significant efforts have been made to develop kernels that can be applied to sequence data such as DNA, text, speech, video and images. The Fisher Kernel and similar variants have been suggested as good ways to combine an underlying generative model in the feature space and discriminant classifiers such as SVM's. In this paper we sug- gest an alternative procedure to the Fisher kernel for systematically find- ing kernel functions that naturally handle variable length sequence data in multimedia domains. In particular for domains such as speech and images we explore the use of kernel functions that take full advantage of well known probabilistic models such as Gaussian Mixtures and sin- gle full covariance Gaussian models. We derive a kernel distance based on the Kullback-Leibler (KL) divergence between generative models.
Evolutionary Design of the Memory Subsystem
Álvarez, Josefa Díaz, Risco-Martín, José L., Colmenar, J. Manuel
This impact is estimated about 50% of the total energy consumption in the chip [1]. This places the memory subsystem as one of the most important sources to improve both performance and energy consumption. Concerns such as thermal issues or high energy consumption can cause a significant performance degradation, as well as irreversible damages to the devices therefore increasing the energy cost. Previous works have shown that saving energy in the memory subsystem can effectively control transistors aging effect and can significantly extend lifetime of the internal structures [2]. Technological changes combined with the development of communications have led to the great expansion of mobile devices such as smartphones, tablets, etc. Mobile devices have evolved rapidly to adapt to the new requirements, giving support to multimedia applications. These devices are supplied with embedded systems, which are mainly battery-powered and usually have less computing resources than desktop systems. Additionally, multimedia applications are usually memory intensive, so they have high performance requirements which implies a high energy consumption. These features increase the pressure on the whole memory subsystem. Processor registers, smaller in size, work at the same speed than the processor and consume less energy compared with other levels of the memory subsystem.
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > District of Columbia > Washington (0.04)
- Europe > Spain > Extremadura (0.04)
- (3 more...)
SINGA-Easy: An Easy-to-Use Framework for MultiModal Analysis
Xing, Naili, Yeung, Sai Ho, Cai, Chenghao, Ng, Teck Khim, Wang, Wei, Yang, Kaiyuan, Yang, Nan, Zhang, Meihui, Chen, Gang, Ooi, Beng Chin
Deep learning has achieved great success in a wide spectrum of multimedia applications such as image classification, natural language processing and multimodal data analysis. Recent years have seen the development of many deep learning frameworks that provide a high-level programming interface for users to design models, conduct training and deploy inference. However, it remains challenging to build an efficient end-to-end multimedia application with most existing frameworks. Specifically, in terms of usability, it is demanding for non-experts to implement deep learning models, obtain the right settings for the entire machine learning pipeline, manage models and datasets, and exploit external data sources all together. Further, in terms of adaptability, elastic computation solutions are much needed as the actual serving workload fluctuates constantly, and scaling the hardware resources to handle the fluctuating workload is typically infeasible. To address these challenges, we introduce SINGA-Easy, a new deep learning framework that provides distributed hyper-parameter tuning at the training stage, dynamic computational cost control at the inference stage, and intuitive user interactions with multimedia contents facilitated by model explanation. Our experiments on the training and deployment of multi-modality data analysis applications show that the framework is both usable and adaptable to dynamic inference loads. We implement SINGA-Easy on top of Apache SINGA and demonstrate our system with the entire machine learning life cycle.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > United States > New York > New York County > New York City (0.04)
- (19 more...)
A Bimodal Learning Approach to Assist Multi-sensory Effects Synchronization
Abreu, Raphael, Santos, Joel dos, Bezerra, Eduardo
In mulsemedia applications, traditional media content (text, image, audio, video, etc.) can be related to media objects that target other human senses (e.g., smell, haptics, taste). Such applications aim at bridging the virtual and real worlds through sensors and actuators. Actuators are responsible for the execution of sensory effects (e.g., wind, heat, light), which produce sensory stimulations on the users. In these applications sensory stimulation must happen in a timely manner regarding the other traditional media content being presented. For example, at the moment in which an explosion is presented in the audiovisual content, it may be adequate to activate actuators that produce heat and light. It is common to use some declarative multimedia authoring language to relate the timestamp in which each media object is to be presented to the execution of some sensory effect. One problem in this setting is that the synchronization of media objects and sensory effects is done manually by the author(s) of the application, a process which is time-consuming and error prone. In this paper, we present a bimodal neural network architecture to assist the synchronization task in mulsemedia applications. Our approach is based on the idea that audio and video signals can be used simultaneously to identify the timestamps in which some sensory effect should be executed. Our learning architecture combines audio and video signals for the prediction of scene components. For evaluation purposes, we construct a dataset based on Google's AudioSet. We provide experiments to validate our bimodal architecture. Our results show that the bimodal approach produces better results when compared to several variants of unimodal architectures.
- South America > Brazil > Rio de Janeiro > Rio de Janeiro (0.05)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- (2 more...)
A Topic Model Approach to Multi-Modal Similarity
Troelsgård, Rasmus, Jensen, Bjørn Sand, Hansen, Lars Kai
Calculating similarities between objects defined by many heterogeneous data modalities is an important challenge in many multimedia applications. We use a multi-modal topic model as a basis for defining such a similarity between objects. We propose to compare the resulting similarities from different model realizations using the non-parametric Mantel test. The approach is evaluated on a music dataset.
- North America > United States > New York > New York County > New York City (0.16)
- Europe > Denmark (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East > Jordan (0.04)
- Media > Music (0.70)
- Leisure & Entertainment (0.70)